{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<b><h1><center>PARTICLE SWARM OPTIMIZATION FOR FEATURE SELECTION IN CLASSIFICATION</center></h1></b>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"res/pso.gif\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>In computational science, <b>particle swarm optimization (PSO)</b> is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formulae over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>PSO is initialized with a group of random particles (solutions) and then searches for optima by updating generations. In every iteration, each particle is updated by following two \"best\" values. The first one is the best solution (fitness) it has achieved so far. (The fitness value is also stored.) This value is called pbest. Another \"best\" value that is tracked by the particle swarm optimizer is the best value, obtained so far by any particle in the population. This best value is a global best and called gbest. When a particle takes part of the population as its topological neighbors, the best value is a local best and is called lbest.</center>\n",
    "\n",
    "<br>\n",
    "<center>After finding the two best values, the particle updates its velocity and positions with following equation (a) and (b).</center>\n",
    "\n",
    "<br>\n",
    "<center>$ v[] = v[] + c1 * rand() * (pbest[] - present[]) + c2 * rand() * (gbest[] - present[]) (a) $</center>\n",
    "\n",
    "<br>\n",
    "<center>$ present[] = persent[] + v[] (b) $</center>\n",
    "\n",
    "<br>\n",
    "<center>v[] is the particle velocity, persent[] is the current particle (solution). pbest[] and gbest[] are defined as stated before. rand () is a random number between (0,1). c1, c2 are learning factors. usually c1 = c2 = 2.</center>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><h2> Application of feature selection using PSO </h2>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><img src=\"res/psofs.png\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center> Fig: The evolutionary training process of a PSO based feature selection algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center> <h3> Representation of feature set for PSO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center>\n",
    "In PSO for feature selection, the representation of a particle is a n-bit string, where n is the total number of features in the dataset. The position value in the dth dimension (i.e. x<sub>id</sub>) is in [0,1], which shows the probability of the dth feature being selected. A threshold θ is used to determine whether a feature is selected or not. If x<sub>id</sub> > θ , the d<sup>th</sup> feature is selected. Otherwise, the d<sup>th</sup> feature is not selected."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><h1>Application of wrapper based feature selection using PSO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<center><h9> This project was done by<b> Himangshu Shekhar Baruah</b>, Gauhati University and <b>Jyotishman Thakur</b>,Gauhati University under the guidance of Mr. Nazrul Haque, Assistant Professor, Tezpur University"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>Importing the required packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import random as rand"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>Importing the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(351, 35)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\n",
    "df=pd.read_csv('ionosphere_csv.csv')\n",
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>Initializing the hyperparameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "w=1\n",
    "c1=2\n",
    "c2=2\n",
    "popsize=50\n",
    "gbestfitness=1\n",
    "gbest=np.random.uniform(0,0,34)\n",
    "itr=10000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>Selecting the class label and feature set and finding mean of the mutual informations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "y=df['class']\n",
    "x=df.drop(['class'],axis=1)\n",
    "\n",
    "from sklearn.feature_selection import mutual_info_classif\n",
    "\n",
    "feature_scores = mutual_info_classif(x, y)\n",
    "\n",
    "mean=np.mean(feature_scores)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>Initialization of a particle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
      "  FutureWarning)\n",
      "/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
      "  FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "class Particle:\n",
    "    #POSITION\n",
    "    pos=np.random.uniform(0,1,34)\n",
    "    #VELOCITY\n",
    "    vel=np.random.uniform(0,0,34)\n",
    "    \n",
    "    bits=np.random.uniform(0,0,34)\n",
    "    \n",
    "    #convert to binary string\n",
    "    for i in range(0,34):\n",
    "        bits[i]=pos[i]\n",
    "        if bits[i]>mean:\n",
    "            bits[i]=1\n",
    "        else:\n",
    "            bits[i]=0\n",
    "    bits=bits.astype(int)\n",
    "    #selecting the feature set\n",
    "    lst=[]\n",
    "    for i in range(0,34):\n",
    "        if bits[i]==0:\n",
    "            lst.append(i)\n",
    "    xPSO=x.drop(x.columns[lst],axis=1) #selected feature set\n",
    "    \n",
    "    #Test train split\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    train_x, test_x, train_y, test_y = train_test_split(xPSO, y, random_state = 0)\n",
    "    \n",
    "     #evaluation of feature set using Logistic Regression\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    train_x, test_x, train_y, test_y = train_test_split(xPSO, y, random_state = 0)\n",
    "    from sklearn.linear_model import LogisticRegression\n",
    "    from sklearn.metrics import confusion_matrix\n",
    "    model=LogisticRegression()\n",
    "    model.fit(train_x,train_y)\n",
    "    model.score(test_x,test_y)\n",
    "    predicted = model.predict(test_x)\n",
    "    matrix = confusion_matrix(test_y, predicted)\n",
    "    \n",
    "    #confusion matrix\n",
    "    TP=matrix[0][0]\n",
    "    FP=matrix[0][1]\n",
    "    FN=matrix[1][0]\n",
    "    TN=matrix[1][1]\n",
    "    \n",
    "    #FITNESS\n",
    "    fitness=(FP+FN)/(TP+FP+FN+TN)\n",
    "    \n",
    "    #PERSONAL BEST\n",
    "    pbest=pos\n",
    "    \n",
    "    pbbits=np.random.uniform(0,0,34)\n",
    "    \n",
    "    #convert to binary string\n",
    "    for i in range(0,34):\n",
    "        pbbits[i]=pbest[i]\n",
    "        if pbbits[i]>mean:\n",
    "            pbbits[i]=1\n",
    "        else:\n",
    "            pbbits[i]=0\n",
    "    pbbits=pbbits.astype(int)\n",
    "    #selecting the feature set\n",
    "    lst=[]\n",
    "    for i in range(0,34):\n",
    "        if pbbits[i]==0:\n",
    "            lst.append(i)\n",
    "    pbxPSO=x.drop(x.columns[lst],axis=1) #selected feature set\n",
    "    \n",
    "    #Test train split\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    train_x, test_x, train_y, test_y = train_test_split(pbxPSO, y, random_state = 0)\n",
    "    \n",
    "     #evaluation of feature set using Logistic Regression\n",
    "    from sklearn.model_selection import train_test_split\n",
    "    train_x, test_x, train_y, test_y = train_test_split(pbxPSO, y, random_state = 0)\n",
    "    from sklearn.linear_model import LogisticRegression\n",
    "    from sklearn.metrics import confusion_matrix\n",
    "    pbmodel=LogisticRegression()\n",
    "    pbmodel.fit(train_x,train_y)\n",
    "    pbmodel.score(test_x,test_y)\n",
    "    pbpredicted = pbmodel.predict(test_x)\n",
    "    pbmatrix = confusion_matrix(test_y, pbpredicted)\n",
    "    \n",
    "    #confusion matrix\n",
    "    pbTP=pbmatrix[0][0]\n",
    "    pbFP=pbmatrix[0][1]\n",
    "    pbFN=pbmatrix[1][0]\n",
    "    pbTN=pbmatrix[1][1]\n",
    "    \n",
    "    #FITNESS\n",
    "    pbfitness=(pbFP+pbFN)/(pbTP+pbFP+pbFN+pbTN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5> Creating the Swarm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "par=list()\n",
    "for i in range(0,popsize):\n",
    "    par.append(Particle())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5>PSO Algorithm"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "for i in range (0,itr):\n",
    "    for i in range(0,popsize):\n",
    "        if par[i].fitness < par[i].pbfitness:\n",
    "            par[i].pbest=par[i].pos\n",
    "        \n",
    "        if par[i].pbfitness < gbestfitness:\n",
    "            gbest=par[i].pbest\n",
    "            gbesttfitness=par[i].pbfitness\n",
    "    for i in range(0,popsize):\n",
    "        par[i].vel=par[i].vel + c1*rand.random()*(par[i].pbest - par[i].pos) + c2*rand.random()*(gbest - par[i].pos)\n",
    "        par[i].pos=par[i].pos+par[i].vel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h5> Gbest and selected Feature Set with evaluation metrics"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.00269274 0.23523322 0.75301693 0.8096505  0.97260097 0.27305119\n",
      " 0.5257951  0.72169683 0.81104719 0.98157818 0.37503834 0.31827745\n",
      " 0.53380666 0.96233472 0.77800299 0.07322388 0.42918699 0.15207455\n",
      " 0.04402728 0.14384913 0.01429594 0.86243715 0.45987949 0.04660869\n",
      " 0.47418703 0.44270455 0.25971285 0.68120353 0.10369029 0.21124851\n",
      " 0.57418309 0.87819869 0.64173189 0.1461749 ]\n",
      "accuracy = 0.8863636363636364\n",
      "precision = 0.7368421052631579\n",
      "recall = 1.0\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/anaconda3/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:433: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
      "  FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "print(gbest)\n",
    "gbbits=np.random.uniform(0,0,34)\n",
    "for i in range(0,34):\n",
    "    gbbits[i]=gbest[i]\n",
    "    \n",
    "    if gbbits[i]>mean:\n",
    "        gbbits[i]=1\n",
    "    else:\n",
    "        gbbits[i]=0\n",
    "    #gbbits=gbbits.astype(int)\n",
    "    #selecting the feature set\n",
    "    lst1=[]\n",
    "    for i in range(0,34):\n",
    "        if gbbits[i]==0:\n",
    "            lst1.append(i)\n",
    "    gbxPSO=x.drop(x.columns[lst1],axis=1) #selected feature set\n",
    "    \n",
    "#Evaluation of the selected feature set\n",
    "from sklearn.model_selection import train_test_split\n",
    "train_x, test_x, train_y, test_y = train_test_split(gbxPSO, y, random_state = 0)\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "from sklearn.metrics import confusion_matrix\n",
    "gbmodel=LogisticRegression()\n",
    "gbmodel.fit(train_x,train_y)\n",
    "gbmodel.score(test_x,test_y)\n",
    "gbpredicted = gbmodel.predict(test_x)\n",
    "gbmatrix = confusion_matrix(test_y, gbpredicted)\n",
    "    \n",
    "#confusion matrix\n",
    "gbTP=gbmatrix[0][0]\n",
    "gbFP=gbmatrix[0][1]\n",
    "gbFN=gbmatrix[1][0]\n",
    "gbTN=gbmatrix[1][1]\n",
    "    \n",
    "#Accuracy\n",
    "accuracy=(gbTP+gbTN)/(gbTP+gbTN+gbFP+gbFN)\n",
    "print(\"accuracy = \"+accuracy.astype(str))\n",
    "    \n",
    "#Precision\n",
    "precision=gbTP/(gbTP+gbFP)\n",
    "print(\"precision = \"+precision.astype(str))\n",
    "    \n",
    "#Recall\n",
    "recall=gbTP/(gbTP+gbFN)\n",
    "print(\"recall = \"+recall.astype(str))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
